Author
Affiliation

Caspar David Peter

Rotterdam School of Management, Accounting Department

Data Analytics for Finance

BM17FI Β· Rotterdam School of Management

RSM Logo

Stata Cheat Sheet

All commands used in Assignments 1–6

NoteHow to use this reference

This page lists every Stata command used in the course assignments, grouped by category. Each entry includes a short description, a minimal example, and a link to the official documentation. Examples are illustrative β€” adapt variable names and options to your own data.

Environment & Setup

Commands for preparing your Stata session before working with data.

Command Description Example Docs
clear Remove data (and optionally all objects) from memory clear all πŸ“–
set more off Disable output pagination so results scroll continuously set more off πŸ“–
set scheme Set the default appearance for all subsequent graphs set scheme stcolor πŸ“–
set linesize Set the width (in characters) of the output window set linesize 120 πŸ“–
pwd Print the current working directory pwd πŸ“–
global Define a global macro, accessible anywhere in your session global datadir "$base/data" πŸ“–
local Define a local macro, accessible only within the current context local cutoff = td(1jan2020) πŸ“–
scalar Store a single numeric or string value scalar threshold = 0.05 πŸ“–
display Print text, macro values, or expressions to the console display "Obs: " _N πŸ“–
ssc install Install a user-written package from the SSC archive ssc install estout, replace πŸ“–

Example

* Typical session setup
clear all
set more off
set scheme stcolor

global base "`c(pwd)'"
global raw  "$base/data/raw"

display "Working directory: $base"

Data Import & Export

Commands for reading data into Stata and saving results to disk.

Command Description Example Docs
import delimited Import a CSV (or other delimited) file import delimited "$raw/sales.csv", clear πŸ“–
use Load a Stata .dta file use "$processed/panel.dta", clear πŸ“–
save Save the current dataset as a .dta file save "$processed/clean.dta", replace πŸ“–
erase Delete a file from disk erase "$processed/temp.dta" πŸ“–

Example

* Import a CSV file and save as Stata format
import delimited "$raw/monthly_sales.csv", clear varnames(1)
save "$processed/monthly_sales.dta", replace

Data Exploration

Commands for inspecting and understanding your dataset.

Command Description Example Docs
describe Show variable names, types, labels, and storage info describe πŸ“–
list Display selected observations in a table list id date price in 1/5 πŸ“–
count Count observations, optionally with a condition count if missing(revenue) πŸ“–
summarize Compute summary statistics (mean, sd, min, max, etc.) summarize price, detail πŸ“–
tabulate Produce a one-way or two-way frequency table tabulate industry year πŸ“–
tabstat Display compact summary statistics, optionally by group tabstat revenue profit, by(region) stat(n mean sd) πŸ“–
table Flexible table of statistics table region year, stat(mean sales) nformat(%9.2f) πŸ“–
correlate Display a correlation matrix correlate x1 x2 x3 πŸ“–
misstable summarize Summarize patterns of missing data misstable summarize revenue profit πŸ“–
duplicates report Report duplicate observations duplicates report id date πŸ“–

Example

* Quick data audit
describe
summarize price volume, detail
tabstat price volume, by(industry) stat(n mean sd min max)
count if missing(price)
misstable summarize

Data Management β€” Variables

Commands for creating, modifying, and labeling variables.

Command Description Example Docs
generate Create a new variable gen log_sales = ln(sales) πŸ“–
replace Modify values of an existing variable replace status = 1 if year >= 2020 πŸ“–
drop Remove variables or observations drop temp_var πŸ“–
keep Keep only specified variables or observations keep if year >= 2015 πŸ“–
rename Rename a variable rename total_assets ta πŸ“–
order Reorder variables in the dataset order id date price volume πŸ“–
encode Convert a string variable to a labeled numeric variable encode country, gen(country_num) πŸ“–
format Set the display format of a variable format date %td πŸ“–
label variable Attach a descriptive label to a variable label variable log_sales "Log of sales" πŸ“–
label define Define a named set of value labels label define yesno 0 "No" 1 "Yes" πŸ“–
label values Assign a value-label set to a variable label values treated yesno πŸ“–
xtile Create quantile-based categories (terciles, quartiles, etc.) xtile size_q = total_assets, nq(3) πŸ“–

Example

* Create and label a binary treatment indicator
gen post = (date >= td(1jan2020))
label variable post "Post-treatment indicator"
label define post_lbl 0 "Pre" 1 "Post"
label values post post_lbl
tabulate post

Data Management β€” Observations & Datasets

Commands for sorting, merging, reshaping, and aggregating data.

Command Description Example Docs
sort Sort observations by one or more variables sort firm_id date πŸ“–
merge Merge the current dataset with another on key variable(s) merge m:1 firm_id using "firms.dta" πŸ“–
collapse Aggregate data to a summary level collapse (mean) avg_ret=ret, by(industry year) πŸ“–
reshape Reshape data between wide and long formats reshape wide sales, i(firm_id date) j(product) πŸ“–
preserve Save a snapshot of the current data in memory preserve πŸ“–
restore Restore data from a previous preserve restore πŸ“–

Example: Merging datasets

* Merge daily stock data with firm characteristics
use "daily_prices.dta", clear
merge m:1 firm_id using "firm_characteristics.dta"
tabulate _merge
keep if _merge == 3
drop _merge

Example: Collapse and reshape

* Compute average price by firm and year, then reshape to wide
preserve
collapse (mean) avg_price=price, by(firm_id year)
reshape wide avg_price, i(firm_id) j(year)
restore

By-group Operations

Commands that operate separately within groups defined by one or more variables.

Command Description Example Docs
bysort / by Execute a command separately for each group bysort firm_id (date): gen cumret = sum(ret) πŸ“–
egen Extended generate β€” group-aware functions by firm_id: egen avg_ret = mean(ret) πŸ“–

Example

* Calculate running sum and group mean within each firm
sort firm_id date
bysort firm_id (date): gen cum_return = sum(daily_ret)
by firm_id: egen firm_avg_ret = mean(daily_ret)
by firm_id: egen firm_sd_ret  = sd(daily_ret)
Tipegen functions

Common egen functions: mean(), sd(), max(), min(), total(), count(), rank(). These calculate statistics within the group defined by by.

Panel Data Setup

Commands for declaring and inspecting panel (longitudinal) data structures.

Command Description Example Docs
xtset Declare the panel variable and time variable xtset firm_id date πŸ“–
xtdescribe Describe the panel structure (balance, gaps, span) xtdescribe πŸ“–
L. Time-series lag operator (requires xtset or tsset) gen ret = ln(price / L.price) πŸ“–

Example

* Declare panel and compute log returns
xtset firm_id date
xtdescribe
gen log_return = ln(price / L.price)
label variable log_return "Daily log return"
TipTime-series operators

After xtset, you can use: L. (lag), L2. (two-period lag), F. (lead), D. (first difference). These operate within each panel unit automatically.

Statistical Tests

Commands for hypothesis testing and distributional checks.

Command Description Example Docs
ttest One- or two-sample t-test ttest score, by(treatment) πŸ“–
swilk Shapiro–Wilk test for normality swilk residuals πŸ“–
estat hettest Breusch–Pagan test for heteroskedasticity (after regress) estat hettest πŸ“–

Example

* Compare mean exam scores between two groups
ttest exam_score, by(study_group)

* After a regression, check residual normality
regress y x1 x2
predict resid, residuals
swilk resid

Regression Analysis

Core commands for estimating linear models.

Command Description Example Docs
regress OLS linear regression regress y x1 x2 x3, robust πŸ“–
xtreg Panel data regression (fixed or random effects) xtreg y x1 x2, fe vce(cluster firm_id) πŸ“–
predict Generate predicted values or residuals after estimation predict yhat, xb πŸ“–

Example: OLS with robust standard errors

regress wage education experience age, robust
predict wage_hat
predict resid, residuals

Example: Panel fixed-effects regression

xtset firm_id year
xtreg revenue marketing_spend rd_spend, fe vce(cluster firm_id)

Storing & Comparing Estimates

Commands for saving regression results and displaying them side by side.

Command Description Example Docs
estimates store Store the current estimation result under a name estimates store model1 πŸ“–
estimates restore Restore a previously stored estimation estimates restore model1 πŸ“–
estimates table Display stored estimates side by side estimates table model1 model2, star πŸ“–
estimates dir List all stored estimation results estimates dir πŸ“–

Example

* Run two specifications and compare
regress y x1 x2, robust
estimates store ols_base

regress y x1 x2 x3 x4, robust
estimates store ols_full

estimates table ols_base ols_full, star stats(N r2_a)

Formatted Regression Tables

Commands from the estout package for producing publication-ready tables.

Command Description Example Docs
eststo Store an estimation result (shorthand) eststo m1: regress y x1, robust πŸ“–
esttab Export a formatted regression table (screen, LaTeX, CSV, …) esttab m1 m2 using "table.tex", se label replace πŸ“–
estpost Post results from non-estimation commands for use with esttab estpost summarize y x1 x2 πŸ“–
Note

Install with ssc install estout, replace. Full documentation: estout homepage.

Example

* Build a regression table with three models
eststo clear
eststo m1: regress y x1, robust
eststo m2: regress y x1 x2, robust
eststo m3: regress y x1 x2 x3, robust

esttab m1 m2 m3, ///
    se star(* 0.10 ** 0.05 *** 0.01) ///
    label r2 N ///
    title("Regression Results")

Example: Summary statistics table

estpost summarize price volume market_cap
esttab, cells("mean(fmt(2)) sd(fmt(2)) min max count") nomtitle nonumber

Graphics β€” Core Plot Types

Command Description Example Docs
twoway line Line plot (typically for time series) twoway line price date, title("Price Over Time") πŸ“–
twoway scatter Scatter plot scatter y x, mlabel(name) πŸ“–
twoway lfit Overlay a linear-fit line twoway (scatter y x) (lfit y x) πŸ“–
twoway function Plot an arbitrary function twoway function y=0, range(x) lcolor(red) πŸ“–
twoway rcap Range plot with capped spikes (confidence intervals) twoway rcap ci_hi ci_lo x πŸ“–
histogram Histogram with optional normal-density overlay histogram ret, normal bin(40) πŸ“–
graph export Save the current graph to a file (PNG, PDF, SVG, …) graph export "fig.png", replace width(1200) πŸ“–

Example: Time series with event marker

twoway line price date if firm_id == 42, ///
    title("Daily Stock Price") ///
    xtitle("Date") ytitle("Price (EUR)") ///
    xline(`event_date', lpattern(dash) lcolor(red))

graph export "$figures/price_plot.png", replace width(1400)

Example: Multi-series comparison

twoway ///
    (line price date if industry == "Tech",  lcolor(navy)) ///
    (line price date if industry == "Banks", lcolor(maroon)), ///
    legend(label(1 "Technology") label(2 "Banking") ///
           position(6) cols(2)) ///
    xtitle("Date") ytitle("Price (EUR)")

Example: Scatter with fitted line

twoway (scatter wage education) ///
       (lfit wage education, lcolor(red)), ///
    title("Wages vs. Education") ///
    xtitle("Years of Education") ytitle("Hourly Wage")

Graphics β€” Common Options

A quick reference for the most-used graph options across the assignments.

Option Purpose Example
title() Main graph title title("Stock Returns")
subtitle() Subtitle below the title subtitle("2015–2020")
xtitle() / ytitle() Axis labels xtitle("Date")
note() Footnote below the graph note("Source: Compustat")
legend() Control the legend legend(label(1 "Firm A") position(6) cols(2))
xline() / yline() Add reference lines xline(21550, lpattern(dash) lcolor(red))
lcolor() Line colour lcolor(navy)
lwidth() Line thickness lwidth(medium)
lpattern() Line pattern (solid, dash, dot, …) lpattern(dash)
mcolor() / mlabel() Marker colour / labels mcolor(navy) mlabel(name)
by() Create a panel of graphs, one per group histogram ret, by(firm)
name(, replace) Store the graph in memory under a name name(g1, replace)
xlabel() / ylabel() Customise axis tick marks xlabel(, format(%td) angle(45))
TipLine continuation

Use /// at the end of a line to continue the command on the next line. This keeps long graph commands readable.

Programming & Flow Control

Command Description Example Docs
foreach Loop over a list of items foreach v in x1 x2 x3 { summarizev’ }| [πŸ“–](https://www.stata.com/manuals/pforeach.pdf) | |forvalues| Loop over a numeric range |forvalues i = 1/5 { display i' } πŸ“–
if / else Conditional execution of code blocks if _rc == 0 { display "OK" } πŸ“–
capture Run a command and suppress any error; stores return code in _rc capture confirm file "data.dta" πŸ“–
quietly Run a command but suppress all output quietly regress y x1 πŸ“–
assert Assert that a condition holds; error if it does not assert _N > 0 πŸ“–
confirm Confirm that a file or variable exists confirm file "$raw/data.dta" πŸ“–
levelsof Store the unique values of a variable in a local macro levelsof region, local(regions) πŸ“–
return list Display saved results from the last r-class command return list πŸ“–

Example

* Loop over variables and summarize each
foreach var in revenue profit assets {
    display "--- `var' ---"
    summarize `var'
}

* Check all expected files exist
foreach f in "q1.dta" "q2.dta" "q3.dta" {
    capture confirm file "$raw/`f'"
    if _rc != 0 {
        display as error "Missing: `f'"
    }
}
TipAccessing saved results

Most Stata commands store results you can reuse:

  • r-class (e.g., summarize): access with r(mean), r(sd), r(N), etc.
  • e-class (e.g., regress): access with e(N), e(r2), e(cmd), etc.
  • Coefficients: _b[varname] and _se[varname] after any estimation command.
  • System values: _N (total obs), _n (current obs number), _rc (last return code).
  • System constants: c(pwd), c(k) (number of variables), c(N) (number of obs).

Functions Reference

Key functions used inside generate, replace, if, and other expressions.

Math functions

Function Description Example
ln(x) Natural logarithm gen log_assets = ln(total_assets)
exp(x) Exponential (\(e^x\)) gen level = exp(log_ret)
abs(x) Absolute value gen abs_ret = abs(ret)
sum(x) Running (cumulative) sum β€” within bysort: gen bysort id (date): gen cumsum = sum(ret)

Date functions

Function Description Example
td(DDmonYYYY) Convert a literal date string to a Stata date number local d = td(15mar2020)
date(s, mask) Parse a string variable to a Stata date number gen stata_date = date(date_str, "YMD")
mdy(m, d, y) Create a date from month, day, and year values gen event = mdy(9, 18, 2015)
year(d) Extract the year from a date gen yr = year(date)
month(d) Extract the month from a date gen mo = month(date)
mofd(d) Convert a daily date to a monthly date gen month_date = mofd(date)
dofm(m) Convert a monthly date to the first day of that month gen first_day = dofm(month_date)

String & logical functions

Function Description Example
missing(x) Returns 1 if x is missing, 0 otherwise count if missing(price)
inlist(x, a, b, …) Returns 1 if x equals any listed value keep if inlist(country, "DE", "FR", "NL")
strpos(s, sub) Position of substring (0 if not found) gen has_ag = strpos(name, "AG") > 0
TipDate formatting

After creating a Stata date variable, apply a display format so dates are human-readable: format date %td (daily), format month_date %tm (monthly).

User-Written Packages

These packages are not part of base Stata and must be installed before first use.

Package Description Install Docs
reghdfe Linear regression with multiple levels of fixed effects ssc install reghdfe, replace πŸ“–
ftools Fast Mata routines (required by reghdfe) ssc install ftools, replace πŸ“–
estout Suite for formatted tables (esttab, eststo, estpost) ssc install estout, replace πŸ“–
coefplot Coefficient plots from stored estimates ssc install coefplot, replace πŸ“–
distinct Count the number of distinct values of a variable ssc install distinct, replace πŸ“–
rangestat Calculate statistics over observation ranges / rolling windows ssc install rangestat, replace πŸ“–

Example: reghdfe

* TWFE regression absorbing firm and time fixed effects
reghdfe outcome treatment controls, ///
    absorb(firm_id year#month) vce(cluster firm_id)

Example: rangestat

* Rolling 252-day standard deviation of returns
rangestat (sd) rolling_sd = daily_ret, ///
    interval(date -252 -1) by(firm_id)

Shell Commands

Command Description Example Docs
! (prefix) Execute an operating-system shell command from within Stata !ls -lh "$figures" πŸ“–

Data Analytics for Finance

BM17FI Β· Academic Year 2025–26

Erasmus University Rotterdam

Created by: Caspar David Peter

Β© 2026 Rotterdam School of Management